Constructing Large Proposition Databases

نویسندگان

  • Peter Exner
  • Pierre Nugues
چکیده

Using semantic parsing or related techniques, it is possible to extract knowledge from text in the form of predicate–argument structures. Such structures are often called propositions. With the advent of massive corpora such as Wikipedia, it has become possible to apply a systematic analysis of a wide range of documents covering a significant part of human knowledge and build large proposition databases from them. While most approaches focus on shallow syntactic analysis and do not capture the full meaning of a sentence, semantic parsing goes deeper and discovers more information from text with a higher accuracy. This deeper analysis can be applied to discover temporal and location-based propositions from documents. Medical researchers could, for instance, discover articles regarding the interaction of bacteria in a specific body part. Christensen et al. (2010) showed that using a semantic parser in information extraction can yield extractions with higher precision and recall in areas where shallow syntactic approaches have failed. This accuracy comes at a cost of parsing time. However, in the recent years, statistical parsing and especially semantic parsing have become increasingly accurate and efficient in analyzing text. This Master’s thesis describes the creation of multilingual proposition databases using generic semantic dependency parsing. Using a broad domain corpus, Wikipedia, we extracted, processed, clustered, and evaluated a large number of propositions. We built an architecture to provide a complete pipeline dealing with the input of text, extraction of knowledge, storage, and presentation of the resulting propositions. Furthermore, our system is able to handle large-scale extractions, wide domains, and multiple input languages. Wherever possible, the handling of information is automated such that manual labor is kept to a minimum. Proposition databases like the one we constructed, combined with other lexical databases, are expected to be key components in semantic search technology, machine translation, and question and answer (Q&A) systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constructing a Legal Database on Quixote

Legal reasoning is one application of large-scale knowledge information processing, where arti cial intelligence, natural language processing, databases and other technologies are integrated. It is the target for the next-generation of databases. In order to investigate whether or not the deductive object oriented database (DOOD) language/systemQUIXOT E is e ective in legal reasoing, we are bot...

متن کامل

Sux Array 9=@.%"%k%4%j%:%'$nhf3s Sux Array $,$"$k!#$3$l$oj8;zns$na4$f$n@\hx<-$n%]%$%s%?$r<-=q=g$k3jg<$7$?g[ns$g!" Comparison among Sux Array Construction Algorithms

Sux array is a compact data structure for searching matched strings from text databases. It is an array of pointers and stores all suxes of a text in lexicographic order. Because its memory requirement is less than tree structures, it is eective for large databases. Moreover, constructing the sux array is used in the Block Sorting compression scheme. We compare algorithms for constructing sux a...

متن کامل

Focused Entailment Graphs for Open IE Propositions

Open IE methods extract structured propositions from text. However, these propositions are neither consolidated nor generalized, and querying them may lead to insufficient or redundant information. This work suggests an approach to organize open IE propositions using entailment graphs. The entailment relation unifies equivalent propositions and induces a specific-to-general structure. We create...

متن کامل

Tailoring Pattern Databases for Unsolvable Planning Instances

There has been an astounding improvement in domainindependent planning for solvable instances over the last decades and planners have become increasingly efficient at constructing plans. However, this advancement has not been matched by a similar improvement for identifying unsolvable instances. In this paper, we specialise pattern databases for dead-end detection and, thus, for detecting unsol...

متن کامل

The Foundations: Logic and Proof, Sets, and Functions

Learning to construct good mathematical proofs takes years. There is no algorithm for constructing the proof of a true proposition (there is actually a deep theorem in mathematical logic that says this). Instead, the construction of a valid proof is an art, honed after much practice. There are two problems for the beginning student—figuring out the key ideas in a problem (what is it that really...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012